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The  hemagglutinating  protein  HA33  from  Clostridium  botulinum  is 
associated  with  the  large  botulinum  neurotoxin  secreted  complexes  and 
is  critical  in  toxin  protection,  internalization,  and  possibly  activation.  We 
report  the  crystal  structure  of  serotype  A  HA33  (HA33/A)  at  1.5  A 
resolution  that  contains  a  unique  domain  organization  and  a  carbohydrate 
recognition  site.  In  addition,  sequence  alignments  of  the  other  toxin 
complex  components,  including  the  neurotoxin  BoNT/A,  hemagglutinat¬ 
ing  protein  HA17/A,  and  non-toxic  non-hemagglutinating  protein 
NTNHA/A,  suggests  that  most  of  the  toxin  complex  consists  of  a 
reoccurring  (3-trefoil  fold. 
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Introduction 

The  Clostridium  botulinum  neurotoxins  (BoNTs) 
are  among  the  most  toxic  bacterial  toxins  known. 
Exposure  to  toxin  prevents  the  release  of  acetyl¬ 
choline  at  neuromuscular  junctions  and  synapses 
by  cleaving  one  of  the  three  neuronal  proteins  of  the 
soluble  N-ethylmaleimide-sensitive-factor  attach¬ 
ment  protein  receptor  (SNARE)  complex  required 
for  synaptic  vesicle  membrane  fusion  resulting  in 
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flaccid  muscle  paralysis.1-4  Ironically,  these  toxins 
are  used  clinically  to  treat  neuromuscular  disorders. 

C.  botulinum  produces  the  150  kDa  BoNT  con¬ 
comitantly  with  a  group  of  non-toxic  neurotoxin- 
associated  proteins  (NAPs),  forming  complexes 
known  as  progenitor  toxins.  Previous  studies 
have  demonstrated  that  the  NAPs  protect  BoNT 
from  acid  denaturation  in  the  stomach  and  attack 
from  a  variety  of  proteolytic  enzymes  in  the 
gastrointestinal  tract.5-8  Seven  distinct  BoNTs 
have  been  identified  and  are  referred  to  as  serotypes 
A-G,  with  serotype  A  being  the  most  virulent  to 
humans.9  BoNT/A  is  secreted  as  a  progenitor  toxin 
complex  in  one  of  three  sizes,  12  S  (300  kDa),  16  S 
(500  kDa),  or  19  S  (900  kDa),  depending  on  the 
type  and  number  of  NAPs  associated  with  the 
complex.3'10  The  12  S  progenitor  toxin  complex 
consists  of  a  single  BoNT  molecule  and  one  non¬ 
toxic  non-hemagglutinin  (NTNHA)  protein,  but 
lacks  the  associated  proteins  responsible  for  hemag¬ 
glutination  activity.  The  16  S  progenitor  toxin,  in 
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addition  to  the  components  found  in  the  12  S 
complex,  contains  three  hemagglutinin  (HA)  pro¬ 
teins,  HA70  (also  referred  to  as  HA3a  and  HA3b 
after  proteo-lysis),  HA33,  and  HA17.  The  secreted 
and  most  toxic  form  19  S  complex  is  believed  to  be  a 
dimer  of  two  16  S  toxins  linked  by  an  additional 
HA33  protein.11 

The  HA  positive  progenitor  toxins  have  been 
shown  to  bind  to  glycolipids  and  glycoproteins  of 
the  intestinal  microvilli  through  interactions  with 
oligosaccharides,  facilitating  internalization  and 
transport  in  the  bloodstream.12-14  Interestingly,  the 
oral  ingestion  of  the  progenitor  complexes  displays 
~100  times  more  toxicity  than  the  naked  BoNT 
protein.15  Though  the  molecular  details  leading 
to  the  higher  efficacy  of  the  progenitor  toxin 
complexes  remain  unknown,  it  has  been  proposed 
that  their  improved  effectiveness  may  be  due  to 
the  stabilization  and  protection  of  BoNT  by  the 
NAPs.15'16 

The  most  prominent  of  the  NAPs  is  HA33, 
comprising  up  to  ~30%  of  the  19S  progenitor 
toxin  complex  mass,  with  binding  specificity  for 
diverse  carbohydrates,  depending  on  the  specific 
Clostridium  serotype  and  strain.  For  instance,  the 
HA33  protein  of  serotype  A  (HA33/A)  binds 
glycolipids  and  glycoproteins  containing  galac¬ 
tose.13,  7,18  In  comparison,  the  HA  components 
associated  with  BoNT/C  contain  two  distinct 
carbohydrate-binding  proteins,  HA33/C  (also 
referred  to  as  type  C  HA1)  and  the  C  terminus  of 
HA70/C  (HA3b)  that  both  recognize  sialic  acid- 
containing  glycolipids  and  glycoproteins,  albeit 
with  different  specificities.19  Little  is  known  about 
the  function  of  the  other  NAPs,  HA17  and  NTNHA. 
Here,  we  report  the  crystal  structure  of  HA33/A 
from  C.  botulinum ,  the  first  structure  of  a  serotype  A 
NAP. 


Results  and  Discussion 

HA33/A  structure 

HA33/A  was  isolated  from  the  progenitor  toxin 
complex  of  the  C.  botulinum  Hall  strain  and  its  X-ray 
crystal  structure  (Figure  1)  was  determined  to 
1.50  A  resolution  by  molecular  replacement  using 


the  HA33/C  structure  (PDB  code  1QXM17)  as  the 
search  model.  Data  collection,  model  building  and 
refinement  statistics  are  summarized  in  Table  1.  The 
final  model  includes  two  protein  molecules,  chains 
A  and  B  (residues  10-293),  and  564  water  molecules 
in  the  asymmetric  unit.  The  final  R-factor  is  16.8% 
with  an  Rfree  factor  of  19.4%.  The  Ramachandran 
plot,  produced  by  MolProbity,20  shows  that  all 
residues  lie  within  allowed  regions.  No  electron 
density  was  observed  for  the  first  nine  N-terminal 
residues.  This  finding  is  consistent  with  the 
observed  post-translational  modification,  which 
removes  the  first  five  N-terminal  amino  acid 
residues  21  It  is  not  known  whether  this  modifi¬ 
cation  has  any  functional  or  serotype-specific 
repercussions  for  HA33/A. 

Each  HA33/A  molecule  (Figure  1)  consists  of  a 
single  polypeptide  chain  of  284  (10-293)  residues 
with  an  overall  shape  reminiscent  of  a  dumbbell. 
HA33/A  contains  two  (3-trefoil  domains  connected 
by  a  short  a-helix.The  dimensions  of  the  HA33/A 
monomer  are  70  A  X  40  A  X  37  A,  with  an  overall 
surface  area  of  10,800  A2.  The  total  (3-strand  and  a- 
helical  content  is  48%  and  10%,  respectively,  which 
is  considerably  lower  as  compared  to  the  (3-strand 
content  predicted  by  FT-IR  and  far-UV  circular 
dichroism  (74-77%). 22  Each  (3-trefoil  domain 
consists  of  three  homologous  (3-trefoil  repeats  that 
are  arranged  about  a  pseudo  3-fold  axis  to  form  a 
12-stranded  anti-parallel  (3-barrel  capped  by  three 
(3-hairpins.  Each  (3-trefoil  repeat  is  composed  of  four 
(3-strands  with  1234  topology,  with  the  second  and 
third  strand  separated  by  a  |3-hairpin  and  the  other 
two  strands  connected  by  loops  of  variable  length. 
These  repeats,  designated  la,  1(3,  and  ly  for  the 
N-terminal  domain  and  2a,  2(3,  and  2y  for  the 
C-terminal  domain,  are  composed  of  residues 
10-55,  56-102,  103-144,  151-197,  198-245,  and  246- 
293,  respectively.  The  two  domains  of  HA33/A  are 
highly  similar  to  each  other  and  can  be  superposed 
with  a  Ca  RMSD  of  1.07  A  for  137  structurally 
equivalent  residues  and  24%  sequence  identity, 
according  to  the  program  TOP.23 

Inter-domain  conformational  plasticity 

The  structures  of  the  two  HA33/A  molecules 
present  in  the  crystallographic  asymmetric  unit. 


Figure  1.  Crystal  structure  of 
HA33/A.  Ribbon  diagram  of 
Clostridium  botulinum  HA33/A 
Hall-A  strain  color-coded  from  N 
terminus  (blue)  to  C  terminus  (red) 
showing  the  two  domains  con¬ 
nected  by  a  short  helical  linker. 
The  p-strands  (|31-|324)  and  oc-helix 
(al)  are  labeled.  The  putative 
carbohydrate-binding  site  is  indi¬ 
cated  with  an  arrow. 
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Table  1.  Summary  of  crystallographic  parameters,  data  collection  and  refinement  statistics  for  HA33/A  (PDB:  1YBI) 

Data  collection 


Space  group 

P21212 

Unit  cell  parameters  (A) 

a  =  104.08,  h  — 146.60,  c  =  35.71 

Wavelength  (A) 

0.9179 

Resolution  range  (A) 

25.00-1.50 

Number  of  observations 

158,552 

Number  of  reflections 

88,654 

Completeness  (%) 

87.0  (78.6)a 

Mean  J/ct(J) 

13.6  (3.7)a 

Rsym  On  I 

0.066  (0.299)a 

Sigma  cutoff 

0.0 

Highest  resolution  shell  (A) 

1.53-1.50 

Model  and  refinement  statistics 

Resolution  range  (A) 

21.22-1.50 

No.  of  reflections  (total) 

77,217 

No.  of  reflections  (test) 

3820 

Completeness  (%  total) 

87.1 

Rc  ry  st  /  Rfree 

0.168/0.194 

Stereochemical  parameters 

Restraints  (RMS  observed) 

Bond  lengths  (A) 

0.009 

Bond  angles  (deg.) 

1.24 

Average  isotropic  B-value  (A2) 

23.2 

ESU  based  on  R  value  (A) 

0.085 

Protein  residues /atoms 

568/4677 

Solvent  molecules 

564 

ESU,  estimated  standard  uncertainties;47,51  Rsym  =  S\ If  —  (/)||/£|/|  where  f  is  the  scaled  intensity  of  the  zth  measurement,  and  is  the 
mean  intensity  for  that  reflection.  Rcryst  =  £||Fobs|  —  |Fcaicll/^|Fobsl  where  Fcalc  and  Fobs  are  the  calculated  and  observed  structure  factor 
amplitudes,  respectively.  Rfree  as  for  Rcryst,  but  for  5.0%  of  the  total  reflections  chosen  at  random  and  omitted  from  refinement. 
a  Highest  resolution  shell. 


chains  A  and  B,  are  slightly  different,  most  likely 
due  to  domain  flexibility,  as  indicated  by  a  Ca 
RMSD  of  1.91  A.  However,  the  individual  N  and  C- 
terminal  domains  found  in  these  two  molecules  are 
moreo  similar  ando  superpose  with  a  Ca  RMSD  of 
0.52  A  and  0.89  A,  respectively,  with  the  largest 
differences  in  the  C-terminal  domains  occurring  at 


loops  223-228  and  243-248.  The  major  disparity  in 
the  overall  structures  of  the  two  molecules  is  due  to 
the  alternate  packing  of  the  two  trefoil  domains 
relative  to  the  connecting  helix  linker  that  causes  an 
approximate  10°  rotation  of  the  C-terminal  domain 
with  respect  to  the  N-terminal  domain  (Figure  2(a)). 
The  focal  point  of  the  domain  rotation  is  located 


Figure  2.  (a)  Ribbon  diagram  of  an  N-terminal  superposition  of  HA33/A  chain  A  (red)  and  HA33/A  chain  B  (green), 
(b)  Same  superposition  as  (a),  but  superposition  of  HA33/A  chain  A  (red)  and  HA33/ C  (white),  (c)  Same  superposition 
used  above,  but  containing  a  close-up  view  of  the  helical  linker  region. 
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Figure  3(a)  ( legend  next  page) 


after  the  hydrophobic  linker  helix  al  (residues  145- 
150)  that  balances  between  the  two  conformations 
observed  in  the  two  HA33/ A  chains.  In  the  A  chain, 
the  inter-domain  interactions  are  less  extensive 
and  mostly  hydrophilic,  such  as  the  water- 
facilitated  hydrogen-bonding  of  Glul43  and 
Glul96.  While  in  the  B  chain  hydrophobic  inter¬ 
actions  predominate,  with  Ilel46  and  Leul50  of 
the  linker  helix  packing  more  deeply  into  a 
hydrophobic  pocket  of  the  C-terminal  domain 


formed  by  residues  Ilel92,  Ile240,  Pro242  and 
Tyr250  than  that  observed  in  chain  A.  Additional 
interactions  in  the  B  chain  include  a  bifurcated 
hydrogen-bonding  network  between  the  side-chain 
of  Asp  144  of  the  N-terminal  domain  with  the  amide 
nitrogen  atoms  of  Ilel46  and  Ilel47  of  helix  al  and 
an  inter-domain  salt-bridge  between  Arg47  and 
Asp247,  which  are  not  found  in  the  chain  A 
molecule  or  the  structure  of  HA33/C.  Results 
obtained  from  a  normal  mode  analysis  using  the 
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Figure  3.  (a)  Sequence  alignment  of  HA33  homologs.  The  alignment  shows  strict  sequence  conservation  in  white 
letters  and  red  background,  and  strong  sequence  conservation  in  red  letters.  The  secondary  structure  elements  of  the 
HA33/A  structure  are  labeled  a  (a-helix),  rj  (3i0  helix),  (3  ((3-strand)  and  TT  (turn).  The  solvent-accessibility  of  each 
residue  in  the  HA33/A  structure  is  indicated  in  the  bar  at  the  base  of  the  sequences,  with  white  representing  buried 
residues,  dark  blue  representing  solvent-accessible  residues  and  light  blue  representing  an  intermediate  value.  The 
residues  at  the  putative  carbohydrate-binding  sites  of  HA33/A  and  HA33/C  are  indicated  underneath  with  red  stars 
and  blue  circles,  respectively.  This  Figure  was  prepared  using  ESPript.49  (b)  Surface  representation  of  HA33/ A  showing 
conserved  patches  in  green  among  HA33  homologs  used  in  the  sequence  alignment  in  (a)  indicating  that  the  N-terminal 
domain  (top)  is  more  highly  conserved  than  the  C-terminal  domain  (bottom);  (c)  same  as  (b),  but  rotated  180°;  (d)  same  as 
(b),  but  looking  down  the  N-terminal  domain;  and  (e)  same  as  (b),  but  looking  down  the  C-terminal  domain  with  the 
residues  Asp263,  Tyr265,  Gln276,  Phe278,  and  Asn285  forming  the  putative  carbohydrate  recognition  site  in  red.  This 
Figure  was  prepared  using  the  ConSurf  server.50 


ElNemo  server24  confer  HA33/A  flexibility 
between  the  trefoil  domains  at  the  helical  linker. 
However,  this  inter-domain  conformational  flexi¬ 
bility  is  not  seen  in  HA33/C,  as  the  two  molecules 
in  its  asymmetric  unit  superpose  with  a  Ca  RMSD  of 
0.53  A.  In  this  light,  further  studies  are  necessary  to 
conclude  whether  the  two  conformations  observed 
in  the  X-ray  structure  of  HA33/A  are  functionally 
important  or  merely  a  crystal  packing  artifact. 

Of  particular  note  is  that  the  inter-domain 
arrangement  of  the  HA33/A  molecules  differs 
significantly  from  that  observed  for  the  HA33/C 
structure.1  A  comparison  of  the  two  serotype 
structures  reveals  an  RMSD  of  2.6  A  over  159 
aligned  residues  (out  of  the  284  possible  residues) 
with  36%  sequence  identity.25  An  approximately  60° 
rotation  of  the  C-terminal  domain  was  observed  in 
the  HA33/C  structure  as  compared  to  the  two 
HA33/A  molecules,  despite  the  fact  that  the  N- 
terminal  domains  of  these  two  serotypes  superpose 
closely  with  an  RMSD  of  0.52  A  (Figure  2(b)  and 
(c)).  The  dissimilarity  of  the  structures  found  for 
these  two  serotypes  is  focused  immediately  before 
the  linker  helix  al,  with  the  result  that  the  two 
trefoil  domains  of  HA33/C  congregate  together. 
This  domain  orientation  dissimilarity  may  be 
serotype-dependent,  since  HA33/C  has  a  longer 
N  terminus  located  at  the  interface  of  the  (3-trefoil 


domains  that  does  not  undergo  the  post- 
translational  cleavage  (Figure  2(b)).  The  difference 
in  the  N  termini  may  contribute  to  the  serotype  size 
differences  in  the  progenitor  toxin  complexes, 
particularly  since  the  19  S  progenitor  toxin  complex 
is  produced  only  by  serotype  A  and  not  by  the  other 
serotypes.  N-terminal  sequence  analysis  has 
revealed  noteworthy  serotype  differences  indicat¬ 
ing  that  HA33/C26  and  HA33/D27  do  not  undergo 
processing  like  that  observed  for  HA33/A  and 
HA33/B,  which  are  similarly  proteolytically  shor¬ 
tened  at  their  N  termini 21  Since  BoNT  serotypes  A 
and  B  are  both  involved  in  human  botulism,  the 
high  level  of  sequence  conservation  of  their  HA33  s 
may  reflect  their  similar  specificities,  activities,  and 
immuno-responses.  Interestingly,  the  less  toxic 
serotype  A2,3  E28  and  F29  strains  lack  the  genes 
encoding  the  HA  components,  and  thus  produce 
only  300  kDa  12  S  toxin  and  have  no  ability  to 
assemble  with  HA  proteins  while  serotype  G  lacks 
only  the  HA33  gene.30 

Sequence  conservation  of  HA33  serotypes 

An  alignment  of  sequences  containing  HA33/A 
and  nine  more  from  other  HA33  serotypes  and 
strains  is  shown  along  with  the  secondary  structure 
of  HA33/A  in  Figure  3(a).  There  is  substantially 
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greater  sequence  conservation  observed  at  the 
surface  of  the  N-terminal  domain  as  compared  to 
the  C-terminal  domain,  though  the  majority  of  the 
conserved  residues  are  solvent-inaccessible  and 
are  presumably  responsible  for  maintaining 
the  hydrophobic  core  of  the  (3-trefoil  fold 
(Figure  3(a)-(e)),  with  the  greatest  conservation 
being  localized  between  the  la  and  ip  repeats  ((3- 
strands  P4-P6)  possessing  65%  identity.  Since  it  is 
not  clear  which  of  the  NAP  components  interact 
with  the  BoNT,  the  greater  sequence  conservation  in 
the  N-terminal  domain  of  HA33/  A  suggests  that  this 
region  is  likely  to  be  important  for  protein-protein 
interactions  in  the  progenitor  toxin  complexes. 
Previous  studies  have  reported  that  HA33/A 
accounts  for  most  of  the  immunogenic  response  of 
the  progenitor  toxin  complexes,31  indicating  that  at 
least  part  of  the  molecule  is  exposed  in  the  complex. 
Furthermore,  C-terminally  truncated  variants  of 
HA33/C  lose  their  hemagglutination  and  erythro¬ 
cyte-binding  activity,  suggesting  that  the  C-terminal 
domain  contains  the  sugar-binding  site.18  The  lower 
level  of  sequence  conservation  in  the  C-terminal 
domain  and  the  fact  that  this  domain  likely 
possesses  the  carbohydrate-binding  site  collectively 
suggest  that  the  C-terminal  HA33  domain  is 
solvent-exposed  and  the  likely  major  contributor 
to  the  distinct  antigenicity  of  the  serotypes. 

Putative  HA33/A  carbohydrate-binding  site 

Recently,  HA33/A  has  been  proposed  to  contain 
a  single  sugar-binding  site  at  the  2y  trefoil  repeat 
specific  for  carbohydrates  containing  galactose,  as 
determined  by  isothermal  titration  calorimetry  and 
mutagenesis.17  In  many  instances,  the  P-trefoil  fold 
has  been  associated  with  oligosaccharide-binding 
ability,  leading  to  a  characteristic  HA  activity, 
including  the  plant  toxin  ricin,  a  prototypical 
ganglioside-binding  protein,  and  other  proteins 
like  the  ribosome-inactivating  and  potential  cancer 
therapeutic  mistletoe  lectin  I.  The  ricin  and  mistle¬ 
toe  lectin  I  structures  revealed  a  domain  architec¬ 
ture  that  are  similar  to  HA33/A  with  two  P-trefoil 
domains.  The  complex  crystal  structures  of  ricin 
bound  to  lactose  and  mistletoe  lectin  I  bound  to 
galactose  revealed  that  only  the  la  and  2y  repeats  of 
these  two  proteins  are  involved  in  carbohydrate 
recognition.32'33  Based  on  the  structural  comparison 
of  HA33/A  to  ricin  (PDB  code  2AAI,  with  an 
overall  sequence  identity  of  14%),  the  2y  trefoil 
repeat  (25%  identity)  of  the  HA33/A  C-terminal 
domain  likely  contains  the  sugar-recognition  site. 
An  overlay  of  the  residues  of  the  2y  repeat  in  the 
HA33/A  structure  with  the  residues  in  the  hom¬ 
ologous  ricin  lectin  chain  shows  a  spatial  coinci¬ 
dence  in  the  lactose-binding  site  when  residues 
contained  in  a  4  A  sphere  around  lactose  are 
overlaid  (Figure  4(a)).  The  key  residues  of  Asp234, 
Arg235,  Glu239,  Tyr248,  and  Asn255  of  ricin  have 
counterparts  in  HA33/A  residues  Asp263,  Tyr265, 
Gln268,  Phe278,  and  Asn285,  respectively.  This 
putative  carbohydrate-binding  site  in  HA33/A 


contains  three  structurally  essential  components 
for  galactoside-binding  that  have  been  proposed  for 
the  ricin  lectin  chain.3  /34  In  support  of  the  2y  sugar¬ 
binding  site  for  HA33/A  are  mutants  Asp263  and 
Asn285  to  Ala  that  lose  their  ability  to  bind 
carbohydrates . 17 

A  comparison  of  the  2y  repeat  of  HA33/C  with 
the  equivalent  region  in  HA33/A  (Figure  4(b)) 
shows  that  they  are  highly  similar,  suggesting  that 
HA33/C  contains  the  necessary  components  for 
carbohydrate  binding,  but  with  noted  differences 
that  are  likely  to  be  important  for  ligand  discrimi¬ 
nation  of  N-acetylneuraminic  acid-containing 
moieties  by  HA33/C.  The  residues  of  Asp263, 
Tyr265,  Gln268,  and  Asn285  in  HA33/A  responsible 
for  the  putative  carbohydrate  recognition  have 
conserved  counterparts  in  HA33/C  residues 
Asp256,  Tyr258,  Gln261,  and  Asn278,  respectively. 
In  addition,  within  strand  (323  at  location  278  in 
HA33/A  (Hall)  (Figure  3(a))  which  has  a  Phe 


Figure  4.  (a)  Close-up  view  of  HA33/  A  super-posed  on 
the  ricin  lactose-binding  site  at  the  2y  repeat  region. 
Putative  carbohydrate-binding  residues  as  observed  in 
HA33/A  (slate  blue)  and  their  counterparts  as  found  in 
the  lactose-bound  ricin  structure  (salmon).  Residue  labels 
are  indicated  for  HA33/A  with  those  from  the  ricin 
structure  in  parentheses,  (b)  Similar  overlay  as  in  (a),  but  a 
close-up  view  at  the  2y  repeat  of  HA33/A  (slate  blue) 
superposed  on  HA33/C  (white),  with  a  modeled  lactose 
molecule  (salmon)  and  with  HA33/C  labels  in 
parentheses. 
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Figure  5.  (a)  Scheme  showing  the  p-trefoil  content  in  the  components  of  the  900  kDa  progenitor  toxin  complex  based 
on  the  molecular  composition  of  the  type  A  toxin  reported,11  although  the  exact  stoichiometry  is  still  under  debate,  (b) 
Sequence  alignment  of  HA17/A  with  HA33/A  N  and  C-terminal  trefoil  domains.  The  alignment  shows  strict  sequence 
conservation  in  white  letters  and  red  background,  and  strong  sequence  conservation  in  red  letters.  The  secondary 
structure  elements  of  the  HA33/A  N-terminal  domain  structure  are  labeled  a  (a-helix),  rj  (3i0  helix),  (3  (p-strand)  and  TT 
(turn).  The  solvent  accessibility  of  each  residue  is  indicated  in  the  bar  displayed  at  the  base  of  the  sequences,  with  white 
representing  buried  residues,  dark  blue  representing  solvent-accessible  residues  and  light  blue  representing  an 
intermediate  value.  The  residues  at  the  putative  carbohydrate-binding  sites  of  HA33/A  and  HA33/C  are  indicated 
underneath  with  red  stars  and  blue  circles,  respectively,  (c)  Same  as  (b),  but  alignment  of  NTNHA  with  the  p-trefoil 
domains  of  TeNT  and  BoNT/A  (Hall- A  strain)  with  the  secondary  structure  elements  of  the  BoNT/A  (PDB  code  3BTA). 
The  residues  at  the  receptor-binding  sites  of  TeNT  are  indicated  underneath  with  black  circles. 
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residue,  the  sequences  from  the  type  C  and  D 
HA33s  (except  Yoichi)  have  an  Asp  residue.  This 
variation  might  account  for  the  differences  in  the 
types  of  carbohydrates  that  bind  to  these  two 
groups  of  HA33s  (i.e.  A,  B  versus  C,  D).  Further 
support  for  the  presence  of  a  carbohydrate-binding 
site  at  the  2y  repeat  in  HA33/C  includes  the  fact 
that  a  C-terminally  truncated  variant  of  HA33/C 
lacking  the  2y  repeat  loses  sugar-binding  activity.18 

A  superposition  of  the  HA33  la  repeats  for  the  A 
and  C  serotypes  (not  shown)  reveals  that  HA33/A 
is  not  expected  to  bind  carbohydrates  at  this  site. 
Only  the  2y  HA33/A  (3-trefoil  motif  possesses  all  of 
the  structural  elements  required  for  carbohydrate 
recognition.  The  absence  of  a  second  carbohydrate¬ 
binding  site  indicates  that  the  HA  activity  of  the 
16  S  and  19  S  progenitor  toxin  complexes  requires 
HA33/A  oligomerization,  either  with  itself  or  with 
other  NAPs,  in  order  to  create  multivalent  sugar¬ 
binding  sites  as  previously  suggested.35 

Role  of  HA33/A  protein  in  the  progenitor  toxin 
complexes 

Previous  studies  suggest  that  the  HA33/A 
protein  is  a  dimer  in  solution  as  determined  by 
gel  filtration  and  mass  spectrometry.22  Secondly, 
analysis  of  the  protein  stoichiometry  in  the  16  S  and 
19  S  progenitor  toxins  complexes  led  to  the  hypo¬ 
thesis  that  a  HA33/A  dimer  links  two  molecules  of 
16  S  neurotoxin  complex  to  form  the  19  S  progenitor 
toxin  complex.11'36  However,  an  analysis  of  the 
crystal  packing  of  the  HA33/A  molecules  reveals 
only  limited  surface  contacts  between  the  two 
molecules.  This  interface  accounts  for  only  10.1% 
of  the  buried  surface  area,  consisting  of  mostly 
hydrophilic  interactions  such  as  residues  Gln34, 
Asn76,  Pro78,  Thr79,  Asn81,  Gln85,  His90,  Lysl28, 
and  Thrl31  of  the  N-terminal  domain  from  one 
HA33/  A  molecule  interacting  with  residues  Met64, 
Ile66,  His67,  Asp245,  and  Asp247  of  both  domains 
from  the  other  HA33/A  molecule.  Given  such  a 
small  and  weak  dimer  interface,  it  is  unlikely  that 
the  crystallographic  interface  is  physiologically 
significant  and  is  unlikely  to  cross-link  two  mol¬ 
ecules  of  16  S  toxin  to  form  the  19  S  progenitor  toxin 
complex.  Furthermore,  to  our  knowledge  the 
crystallographic  dimer  interface  is  unlike  that  seen 
in  other  (3-trefoil-containing  structures.  Nonethe¬ 
less,  further  functional  studies  are  needed  to 
conclusively  determine  the  structural  role  of 
HA33/A  in  forming  the  assembled  progenitor 
toxin  complexes. 

Reoccurring  p-trefoil  fold  in  the  progenitor  toxin 
complex 

In  addition  to  the  (3-trefoil  domains  found  in 
HA33,  BoNT /A  and  BoNT/B  have  previously 
been  shown  to  contain  a  single  (3-trefoil  fold  at 
the  C-terminal  binding  domain  of  the  heavy 
chain.37'38  Fold  recognition  methods  also 
detect  this  (3-trefoil  domain  in  the  NAPs  HA17 


and  NTNHA  (FFAS  scores  —23  and  —208,  respect¬ 
ively;  scores  below  —9.5  typically  indicate  signifi¬ 
cant  similarity  with  less  than  3%  of  false 
positives).39  Based  on  this  analysis,  the  (3-trefoil 
motif  collectively  forms  nearly  half  of  the  mass 
of  the  900  kDa  progenitor  toxin  serotype  A 
complex  (Figure  5(a)).  The  importance  of  sequence 
conservation  of  BoNT  with  this  fold  and  ganglioside 
recognition  has  been  reported;40  however,  the 
significance  of  the  trefoil  fold  within  the  context  of 
the  NAPs  and  the  progenitor  complex  has  yet  to 
be  addressed.  Based  on  the  HA33/A  structure,  a 
sequence  alignment  of  HA17/A  with  the  N  and  C- 
terminal  domains  of  HA33/ A  (each  with  an  overall 
sequence  similarity  of  28%)  allows  the  (3-trefoil 
repeats  in  HA17/A  to  be  identified  and  assessed  for 
sugar-binding  conservation  (Figure  5(b)).  Most  of 
the  sequence  conservation  between  HA17/A  and 
HA33/A  N  and  C-terminal  domains  is  located  in 
the  y  repeats.  Three  residues  that  are  important 
for  carbohydrate  recognition  in  the  HA33/A  C- 
terminal  domain  Asn263,  Tyr265,  and  Asn285  are 
conserved  with  Asnll3,  Tyrll5,  and  Asnl38  in 
HA17/A.  However,  the  three  other  residues  of 
Gln268,  Gln277,  and  Phe279  of  HA33/A  that  form 
the  rest  of  the  carbohydrate-binding  site  have  no 
equivalent  counterpart  in  Ilel21,  Leul29,  and 
Asnl31  of  HA17/A  suggesting  that  HA17/A  lacks 
the  necessary  molecular  requirements  for  sugar¬ 
binding.  Further  support  that  HA17  does  not  bind 
carbohydrates  comes  from  experiments  with  GST 
fusion  proteins  of  HA17/A  and  HA17/C,  which 
did  not  bind  to  erythrocytes  and  intestinal  micro¬ 
villi.12'19  In  addition,  it  has  been  reported  that 
NTNHA  also  does  not  possess  HA  activity  and  that 
it  does  not  bind  to  erythrocytes,  but  NTNHA  is  a 
critical  component  in  formation  of  HA-positive 
progenitor  toxin  complexes.43  NTNHA  shows 
significant  sequence  similarity  to  the  (3-trefoil- 
containing  binding  domains  of  BoNT/ A  and 
tetanus  neurotoxin  (TeNT)  from  Clostridium  tetani 
with  sequence  identities  in  the  a  and  (3  repeats  of 
16%  and  18%,  respectively,  and  an  overall  sequence 
similarity  of  31%.  The  sequence  comparison  of  their 
(3-trefoil  subdomains  (Figure  5(c))  offers  clues  into 
why  NTNHA  does  not  bind  carbohydrates.  The 
crystal  structure  of  TeNT  bound  to  a  GTlb 
ganglioside  receptor  analog  (Gal4-GalNAc3)  at  the 
y  repeat  provides  a  structural  prototype  for  the 
characterization  of  the  ganglioside  binding  site,41 
and  has  proven  crucial  in  identifying  the  ganglio- 
side-binding  site  for  BoNT / A  and  BoNT/B.  7'42  The 
TeNT  key  residues  of  Serl287,  Trpl289,  and  Tyrl290 
are  conserved  in  BoNT/A  with  residues  Serl264, 
Trpl266,  and  Tyrl267,  but  are  poorly  conserved  in 
NTNHA  with  corresponding  residues  of  Asnll80, 
Aspll82,  and  Metll83. 

The  identification  of  the  (3-trefoil  domain  in  a 
majority  of  the  components  of  the  900  kDa  serotype 
A  progenitor  toxin  suggests  this  fold  is  a  result  of 
structural  domain  duplication.  It  also  allows  one  to 
postulate  models  in  determining  the  molecular 
composition  of  the  components  of  the  progenitor 
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toxin  complexes.  With  most  of  the  molecular  pieces 
of  the  proverbial  progenitor  toxin  puzzle  now  at 
hand,  our  future  work  will  focus  on  determining 
the  intimate  protein-protein  contacts  that  make  up 
the  progenitor  toxin  complexes  essential  for  BoNT 
protection,  uptake,  and  activation. 


Materials  and  Methods 

Protein  production  and  crystallization 

HA33/ A  from  the  C.  botulinum  Hall  strain  was  purified 
as  described,7  and  kindly  provided  by  the  DasGupta/ 
Johnson  laboratories  at  the  University  of  Wisconsin. 
Briefly,  the  ammonium  sulfate-precipitated  protein  was 
dissolved  by  dialysis  against  a  buffer  containing  10  mM 
sodium  acetate  (pH  5.5),  200  mM  (NH4)2S04  and  con¬ 
centrated  to  12  mg/ ml  by  ultrafiltration.  The  protein  was 
crystallized  by  the  hanging-drop,  vapor-diffusion  method 
using  equal  volumes  of  protein  and  reservoir  solution. 
The  crystallization  reservoir  solution  contained  20%  (w/ 
v)  PEG  4000, 15%  (v/v)  isopropanol,  200  mM  Li2S04,  and 
0.1  M  Hepes  at  pH  7.5.  Crystals  were  stabilized  in  freshly 
prepared  reservoir  solution  containing  30%  isopropanol 
for  approximately  15  seconds  prior  to  cryo-cooling  in 
liquid  nitrogen.  The  crystals  were  indexed  in  the 
orthorhombic  space  group  P2!2!2  (Table  1). 

Data  collection 

Diffraction  data  were  collected  at  Stanford  Synchrotron 
Radiation  Laboratory  (SSRL,  Stanford,  USA)  on  beamline 
9-1  (Table  1).  Data  were  integrated,  reduced,  and  scaled 
using  Denzo  and  Scalepack.44  Data  statistics  are  sum¬ 
marized  in  Table  1. 

Structure  solution  and  refinement 

Four  homology  models  of  the  individual  HA33/A 
[3-trefoil  domains  were  constructed  with  program 
Whatif,45  based  on  the  FFAS41  alignment  with  HA33/C 
(PDB  code  1QXM,  sequence  identity  38%).  Multiple 
molecular  replacement  searches  were  carried  out  in 
program  MOLREP46  on  a  80  CPU  Linux  cluster  expecting 
four  copies  of  a  single  HA33/  A  domain  in  the  asymmetric 
unit.  Out  of  160  MR  trials  only  four  obtained  with  a  model 
based  on  the  N-terminal  domain  of  the  HA33/  C  (chain  B) 
structure  had  values  of  Rfree  below  0.50  after  rigid  body 
and  restrained  refinement  in  Refmac5.47  Subsequent 
manual  rebuilding  and  refinement  was  carried  out  in 
programs  O48  and  Refmac5.  Refinement  statistics  are 
summarized  in  Table  1.  The  final  model  includes  two 
protein  molecules  and  564  water  molecules.  No  electron 
density  was  observed  for  residues  1-9.  Analysis  of  the 
stereochemical  quality  of  the  model  was  accomplished 
using  the  AutoDepInputToolf.  Figures  were  prepared 
with  PYMOL  (DeLano  Scientific). 

Ligand  docking 

The  probable  binding  site  for  the  carbohydrate  sub¬ 
strate  was  obtained  by  superimposing  the  p-trefoil 
domain  of  the  lactose-bound  ricin  structure34  with  the 


t  http://deposit.pdb.org/adit/ 


C-terminal  trefoil  domain  of  HA33/A  using  the  program 
TOP.23 


Sequence  alignments 

NAP  sequences  were  aligned  using  FFAS41  and  Clustal- 
W  48  A  gap  opening  penalty  of  ten,  gap  extension  penalty 
of  0.05,  and  gap  separation  distance  of  eight  were  used 
with  the  BLOSUM62  matrix.  The  alignment  Figures  were 
prepared  using  ESPript  using  DSSP  secondary  structure 
assignments  4 

Protein  Data  Bank  accession  code 

Atomic  coordinates  and  experimental  structure  factors 
of  HA33/A  have  been  deposited  with  the  PDB  and  are 
accessible  under  the  code  1YBI. 
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